Margin-based Feature Selection Techniques for Support Vector Machine Classification
نویسندگان
چکیده
Feature selection for classification working in high-dimensional feature spaces can improve generalization accuracy, reduce classifier complexity, and is also useful for identifying the important feature “markers”, e.g., biomarkers in a bioinformatics or biomedical context. For support vector machine (SVM) classification, a widely used feature selection technique is recursive feature elimination (RFE). In recent work, we demonstrated that the RFE objective is not generally consistent with the margin maximization objective that is central to the SVM learning approach. We thus proposed explicit margin-based feature elimination (MFE) for SVMs and demonstrated both improved margin and improved generalization accuracy, compared with RFE for the case of linear SVMs. In this paper, after reviewing MFE, we first introduce an extension which achieves further gains in margin at small computational cost. This extension solves the SVM optimization problem to maximize the classifier’s margin at each feature elimination step, albeit in a lightweight fashion by optimizing only two degrees of freedom – the weight vector’s slope and intercept. We next consider the case of a nonlinear kernel. We show that RFE defined for the nonlinear kernel case assumes that the weight vector length is strictly decreasing as features are eliminated. We demonstrate experimentally that this assumption is not in general valid for the Gaussian kernel and that, consequently, RFE may give poor results in this case. An extension of MFE for the nonlinear kernel case gives both better margin and generalization accuracy. This approach may help nonlinear kernel SVMs to avoid overfitting and, thus, to achieve better results than linear SVMs in some high-dimensional domains where use of nonlinear kernels has not to date been found very favorable.
منابع مشابه
Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملSupport Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran
Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملFeature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine
Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods. In filter methods, features subsets are selected due to some measu...
متن کاملModeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification
Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...
متن کامل